A Unified Lyapunov Framework for Finite-Sample Analysis of Reinforcement Learning Algorithms

نویسندگان

چکیده

Reinforcement learning (RL) is a paradigm where an agent learns to accomplish tasks by interacting with the environment, similar how humans learn. RL therefore viewed as promising approach achieve artificial intelligence, evidenced remarkable empirical successes. However, many algorithms are theoretically not well-understood, especially in setting function approximation and off-policy sampling employed. My thesis [1] aims at developing thorough theoretical understanding performance of various through finite-sample analysis. Since most essentially stochastic (SA) for solving variants Bellman equation, first part dedicated analysis general SA involving contraction operator, under Markovian noise. We develop Lyapunov we construct novel called generaled Moreau envelope. The results on enable us establish bounds tabular (cf. Part II thesis) when using III thesis), which turn provide insights several important problems community, such efficiency bootstrapping, bias-variance trade-off learning, stability control. main body this document provides overview contributions my thesis.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Unified Analysis of Value-Function-Based Reinforcement Learning Algorithms

Reinforcement learning is the problem of generating optimal behavior in a sequential decision-making environment given the opportunity of interacting with it. Many algorithms for solving reinforcement-learning problems work by computing improved estimates of the optimal value function. We extend prior analyses of reinforcement-learning algorithms and present a powerful new theorem that can prov...

متن کامل

A Framework for Aggregation of Multiple Reinforcement Learning Algorithms

Aggregation of multiple Reinforcement Learning (RL) algorithms is a new and effective technique to improve the quality of Sequential Decision Making (SDM). SDM is very common and important in various realistic applications, especially in automatic control problems. The quality of a SDM depends on (discounted) long-term rewards rather than the instant rewards. Due to delayed feedback, SDM tasks ...

متن کامل

Algorithms for Learning Finite Automata from Queries: A Unified View

In this survey we compare several known variants of the algorithm for learning deterministic nite automata via membership and equivalence queries. We believe that our presentation makes it easier to understand what is going on and what the diierences between the various algorithms mean. We also include the comparative analysis of the algorithms, review some known lower bounds, prove a new one, ...

متن کامل

A Unified Approach for Design of Lp Polynomial Algorithms

By summarizing Khachiyan's algorithm and Karmarkar's algorithm forlinear program (LP) a unified methodology for the design of polynomial-time algorithms for LP is presented in this paper. A key concept is the so-called extended binary search (EBS) algorithm introduced by the author. It is used as a unified model to analyze the complexities of the existing modem LP algorithms and possibly, help ...

متن کامل

A Learning Rate Analysis of Reinforcement Learning Algorithms in Finite-Horizon

Many reinforcement learning algorithms, like Q-Learning or R-Learning, correspond to adaptative methods for solving Markovian decision problems in innnite-horizon when no model is available. In this article we consider the particular framework of non-stationary nite-horizon Markov Decision Processes. After establishing a relationship between the nite-horizon total reward criterion and the avera...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Performance evaluation review

سال: 2022

ISSN: ['1557-9484', '0163-5999']

DOI: https://doi.org/10.1145/3579342.3579346